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The order of genes in eukaryotic genomes has generally been assumed to be neutral, since gene order is largely scrambled 
over evolutionary time. Only a handful of exceptional examples are known, typically involving deeply conserved clusters 
of tandemly duplicated genes [e.g., Hox genes and histones). Here we report the first systematic survey of microsynteny 
conservation across metazoans, utilizing 17 genome sequences. We identified nearly 600 pairs of unrelated genes that have 
remained tightly physically linked in diverse lineages across over 600 million years of evolution, integrating sequence 
conservation, gene expression data, gene function, epigenetic marks, and other genomic features, we provide extensive 
evidence that many conserved ancient linkages involve (1) the coordinated transcription of neighboring genes, or (2) 
genomic regulatory blocks [GRBs) in which transcriptional enhancers controlling developmental genes are contained 
within nearby bystander genes. In addition, we generated ChiP-seq data for key histone modifications in zebrafish em- 
bryos, which provided further evidence of putative GRBs in embryonic development. Finally, using chromosome con- 
formation capture [3C) assays and stable transgenic experiments, we demonstrate that enhancers within bystander genes 
drive the expression of genes such as Otx and Islet, critical regulators of central nervous system development across 
bilaterians. These results suggest that ancient genomic functional associations are far more common than previously 
thought — involving -12% of the ancestral bilaterian genome — and that d5-regulatory constraints are crucial in de- 
termining metazoan genome architecture. 



[Supplemental material is available for this article.] 

The evolutionary and functional implications of high-order eukary- 
otic genome structures remain topics of contention, and have in- 
spired some of the most ambitious evolutionary hypotheses of the 
genetic and genomic eras (e.g., Doolittle 1978; Lynch 2007; Koonin 
and Wolf 2010). Largely absent from these debates has been the 
question of the local ordering of genes themselves within a genome, 
generally reflecting the common assumption that gene position 
within the genome is mostly a secondary concern and/or is usually 
not constrained (Koonin and Wolf 2010). 

However, a variety of studies suggest that the situation is not 
so simple. Gene order is nonrandom (Hurst et al. 2004; Oliver and 
Misteli 2005; Michalak 2008), with clustering of genes in some 
species according to metabolic pathways and/or similarity of ex- 
pression (Lee and Sonnhammer 2003; Fukuoka et al. 2004; Hurst 
et al. 2004). Particularly interesting are the genomic regulatory 
blocks (GRBs) (Becker and Lenhard 2007; Engstrom et al. 2007; 
Kikuta et al. 2007), which usually consist of (1) a trans-dev gene 
encoding a key transcriptional regulator with a complex spatiotem- 
poral expression pattern, often involved in embryonic development 
(Woolfe et al. 2005; Kikuta et al. 2007); and (2) nearby functionally 
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unrelated bystander gene(s), which contain ds-regulatory sequences 
for the trans-dev gene within their introns (Fig. IB). Gene linkage is 
thus important in GRBs since breakage of the microsyntenic asso- 
ciation would disrupt trans-dev-dssociated ds-regulatory functions. 

Despite these instances of nonrandomness, questions remain 
about the generality and strength of natural selection in main- 
taining gene order. First, similarity of expression between neigh- 
boring genes could be the result rather than the cause of their 
proximity (e.g., because of shared chromatin structure). Second, 
direct studies of selection have produced ambiguous results, with 
evidence for purifying selection on maintaining gene order in 
hemiascomycetous yeasts (Hurst et al. 2002; Fischer et al. 2006; 
Poyatos and Hurst 2007), but not in Drosophila (Weber and Hurst 
2011), even over short phylogenetic distances. Third, gene order is 
nearly completely scrambled between humans and model in- 
vertebrates (Putnam et al. 2008; Denoeud et al. 2010). Further- 
more, although whole-genome analyses of slow-evolving species 
and ancestral karyotype reconstructions showed that the chro- 
mosome-scale organization (so-called macrosynteny) has been 
largely conserved since the last common metazoan ancestor, these 
same studies found almost no traces of preservation of the spe- 
cific ancestral local gene order (microsynteny) across metazoans 
(Putnam et al. 2007, 2008; Srivastava et al. 2008, 2010). 

Even in cases where selection appears to conserve micro- 
synteny, questions remain about the generality and duration of 
these forces over evolutionary time. First, known GRBs are generally 
restricted to vertebrates (Kikuta et al. 2007) or insects (Engstrom et al. 
2007), with only three transphyletic GRBs described to date (Wang 
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Figure 1. Functional causes of conserved microsynteny and phyloge- 
netic distribution of conserved gene pairs. (A) (Coregulation) Two 
neighbor genes share one or more regulatory elements. Chromosomal 
breakpoints in the intergenic region of the two genes are opposed by 
selection since it would affect the coordinated expression of the two 
genes. (B) Genomic regulatory block (GRB): A bystander gene (green) 
contains regulatory elements in its introns that target a neighbor gene 
(red), often a trans-dev gene with key regulatory roles in animal de- 
velopment. The breakage of the association would result in affected reg- 
ulation of the trans-dev gene. (Based on Becker and Lenhard [2007]). (C) 
Consensus phylogeny of the studied metazoans, showing the total 
number of CAMPs in each species (in parentheses), and the minimum 
number of pairs at each node inferred by parsimony (in boxes). Branch 
lengths correspond to the fraction of gene pairs lost out of the total pairs 
present in the last common ancestor. Black dots at basal nodes indicate 
that branch length could not be estimated. 



et al. 2007; Irimia et al. 2012; Maeso et al. 2012a). Second, while 
tandemly duplicated Drosophila genes tend to remain linked 
(Quijano et al. 2008), such duplicates usually reflect recent dupli- 
cations (Thomas 2007; Irimia et al. 2008; Baldwin et al. 2009). This 
suggests that these associations are short-lived, again with only a 
few known exceptions (e.g., Hox genes and histone clusters). Third, 
while coexpression of linked genes is well-established (most strik- 
ingly, by usage of bidirectional promoters between genes) (Adachi 
and Lieber 2002), few gene linkages are conserved between 
eukaryotic kingdoms (Davila-Lopez et al. 2010). As such, the gen- 
eral paradigm holds that neutral processes dominate micro- 
synteny, particularly over long evolutionary times (Srivastava et al. 
2008; Koonin and Wolf 2010). 

We report the first genome-wide analysis of microsynteny 
conservation across metazoans, analyzing 1 7 species spanning 1000 
million years (MY) of evolution. We identified 795 groups of genes 
that are associated in four or more major animal taxa, 595 of which 



correspond to unrelated (nonparalogous) genes, which we term 
conserved ancestral microsyntenic pairs (CAMPs). Multiple lines of 
evidence suggest conservation of gene expression coregulation for 
some CAMPs, and of ancient GRBs involving key trans-dev genes 
for others. 

Results 

Identification of ancient conserved gene associations across 
metazoans 

For each pair of neighboring genes in the genome of a given species, 
we assessed whether the two genes were also tightly genetically 
linked (with four or fewer intervening genes) in other lineages (see 
Methods for details). We identified hundreds of pairs in each genome 
that were conserved across at least four major lineages, ranging from 
197 in the tunicate Ciona intestinalis to 842 in the cephalochordate 
Branchiostoma floridae, and 600 in human (Supplemental Table SI). 
To test this significance we randomized gene order within each 
chromosome/scaffold. Across 100 replicates per species, we found an 
average of 0.015 conserved pairs using the same criteria, a false dis- 
covery rate lower than 0.0002 for all species (Supplemental Table SI). 

We then merged the data for each species into a single data 
set, and filtered for potential annotation errors (see Supplemental 
File SI), yielding a final set of 795 unique groups of gene pairs (or 
three or more gene clusters) (Supplemental Table S2). Due to high 
duplication rates, precise orthology and paralogy relationships 
could not be established for 89/795 groups. These included pre- 
viously described examples, such as the histone clusters (Davila- 
Lopez et al. 2010) and cytochrome-P450 genes (Thomas 2007; 
Baldwin et al. 2009), as well as eight groups of clusters/pairs of 
important developmental genes {Hox, Wnt, Hes, En, Irx, Six, Tbx, 
and other homeobox genes). 

Another 110 groups included paralogous gene pairs whose 
group orthology could be more confidently assigned by BLAST. 
Interestingly, using Bayesian phylogenetic inferences (see Sup- 
plemental File SI for details), we found evidence that at least 68% 
of these pairs (Supplemental Table S3) likely arose by independent 
tandem duplications in the different lineages and are therefore not 
ancestrally linked (e.g., the homologs of the human gene C16orf5; 
Supplemental Fig. SI). This suggests a significant level of recurrent 
evolution of tandem gene duplicates across metazoan genomes 
(Maeso et al. 2012b). Alternatively, however, these patterns could 
also reflect events of gene conversion between ancestral duplicates 
within each species. 

Deeply conserved phylogenetically unrelated gene pairs 

We also identified 595 groups of nonparalogous gene pairs. These 
pairs ranged in degree of conservation, with 3 77 of them conserved 
in four out of the 11 studied major metazoan lineages, 153 in five, 
46 in six, and 19 in seven or more. Since these gene pairs did not 
result from tandem duplications, it is very unlikely that they have 
become linked independently in different lineages, and therefore 
are likely to represent ancient gene associations. We refer to these 
gene pairs as conserved ancestral microsyntenic pairs (CAMPs). 

To expand the phylogenetic coverage, we next assessed con- 
servation of these 595 CAMPs in four additional phylogenetically 
key species whose incomplete genome assemblies or annotations 
precluded their inclusion in the initial analyses: the hemichordate 
Saccoglossus kowalevskii, the tunicate Oikopleura dioica, and two 
outgroups, the sponge Amphimedon queenslandica and the unicel- 
lular opistokont Capsaspora owczarzaki. With this information, we 
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applied parsimony to reconstruct the evolutionary history of 
CAMPs. At least 378 CAMPs were already present at the origin of 
Eumetazoans (all animals but sponges and placozoans) and 593 at 
the origin of Bilateria (two pairs were specific to Protostomes). We 
also reconstructed the degree of CAMP disruption (fraction of an- 
cestral CAMPs lost) on each branch. Notably despite several po- 
tential sources of error (see Supplemental File SI), branches with 
high rates of CAMP disruption correspond closely to branches pre- 
viously shown to have high rates of other genomic changes (e.g., 
intron loss, gene duplications, genome rearrangements, nucleo- 
tide substitutions, etc.) (Kent and Zahler 2000; Lynch and Conery 
2000; Stein et al. 2003; Bourque et al. 2005; Bhutkar et al. 2008; 
Irimia and Roy 2008; Putnam et al. 2008; Denoeud et al. 2010) 
(Fig. IC). For example, generally slow-evolving species such as am- 
phioxus and the mollusk Lottia gigantea have retained 448-492 
CAMPs, whereas fast-evolving lineages such as nematodes (12 
CAMPs), flies (46), and C. intestinalis (54) have conserved far fewer 
(Fig. IC). These results are also in full agreement with previous 
studies on lineage-specific losses of chromosomal-scale gene link- 
age (macrosysteny) (Putnam et al. 2007, 2008; Srivastava et al. 
2008, 2010; Denoeud et al. 2010; Lv et al. 201 1). These results thus 
indicate that CAMP evolution closely follows overarching (though 
still poorly understood) trends of genome evolution. 



Functional causes for evolutionary conservation of ancient 
gene associations 

We next sought independent tests that 
CAMPs have been specifically main- 
tained by selection. We tested predictions 
made by each of the two known general 
hypotheses for functional roles of the 
conservation of microsynteny: coordi- 
nated transcriptional regulation (i.e., 
coregulation) and association of the 
genes as a GRB (Fig. 1). 



test) (Fig. 2A). A similar pattern was found for all species (reaching 
68.6% in L. gigantea) except for flies and tunicates (which have 
similar 3 '-3' and 5 '-5' orientations, see below). The 5 '-5' orien- 
tations are far more conserved (prediction 2): Orientation was 
conserved across all species sharing the CAMP for 51.4% of CAMPs 
with a 5 '-5' in human, compared with 15%-25% for other ori- 
entations (Fig. 2B); indeed, 5-5' CAMPs account for 75.8% of 
human CAMPs with fully conserved orientation. A similar result 
was observed for other species (again with the exceptions of flies 
and tunicates) and for highly conserved CAMPs (five or more 
lineages; data not shown). 

Next, we assessed whether CAMPs have shorter intergenic 
distances (prediction 3), which may facilitate coordinated expres- 
sion (Davila-Lopez et al. 2010). In humans, intergenic regions of 
CAMPs were 2.3-fold shorter than the control set (mean of 48 vs. 
113 Kbp, P = 2.3 X 10"^, KS test). This pattern was observed for all 
orientations, although it was strongest for 5 '-5' orientations (41 
vs. 126 Kbp, P = 1.2 X 10"^). Moreover, 5-5' CAMPs with con- 
served orientation had the shortest intergenic regions of all subsets 
(29 Kbp). Finally, 5 '-5' CAMPs were twice as likely to have very 
short intergenic regions (<1 Kbp, a signature of bidirectional pro- 
moters) (Adachi and Lieber 2002) than the control set (12.5% vs 
6%, P = 0.001, test). 

To study coexpression of CAMPs (prediction 4), we calculated 
gene expression correlations across 23,941 different human 
microarray experiments (conducted using the Affymetrix U133 
Plus 2.0 array). Since neighboring genes are known to have higher 



Coregulation of gene pairs 

Linked genes may share common cis- 
regulatory sequences, leading to coor- 
dinated gene expression (Fig. lA). In this 
case, potential chromosomal breakpoints 
in the intergenic region between such 
linked pairs will disrupt expression. We 
tested six predictions of this scenario: (1) 
enrichment of divergently transcribed 
genes (Le., in a head- to-head or 5 '-5' ori- 
entation) due, e.g., to a bidirectional pro- 
moter, (2) preferential conservation of this 
5 '-5' orientation, (3) short intergenic 
distances, (4) correlated expression pat- 
terns, (5) few insulator elements, and (6) 
highly conserved intergenic sequences (due 
to functional transcriptional elements). 

We found that most CAMPs were or- 
ganized in a 5 '-5' orientation (prediction 
1). In humans, 54.8% of CAMPs showed 
this orientation, compared with 26.7% of 
the control set (i.e., all nonparalogous 
neighboring gene pairs without con- 
served microsynteny; P = 1.37 x 10~^, 
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Figure 2. Predictive features of coregulated gene pairs. (A) Percentage of gene pairs oriented in 3'-3' 
(convergently transcribed), 5'-3' or 3'-5' (codirectionally transcribed), or 5'-5' (divergently tran- 
scribed) among human conserved gene pairs (darl< gray) and control set (all nonconserved, non- 
paralogous gene pairs; light gray). Conserved pairs are highly enriched in 5'-5' orientation (P = 1 x 
10"^, test). (B) Percentage of pairs with fully conserved orientation across species for human con- 
served gene pairs; 5'-5' pairs are more conserved. (C) Cumulative plot of ranks of ranked correlation 
coefficients of conserved pairs among the control data set (nonconserved neighbor pairs with the same 
orientation and similar intergenic distance; black) and random gene pairs (gray). The red line shows the 
expected cumulative pattern if coexpression was similar to the controls, and the area above the line the 
excess of highly coexpressed pairs. (D) Percentage of conserved pairs with ranks in each of the decimal 
bins. The red line (1 0%) indicates the expected values. Conserved pairs were significantly enriched in 
ranks of the last decimal bin (91-100, red arrowhead). 
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coexpression than unlinked genes (Fukuoka et al. 2004; Hurst et al. 
2004), we compared each CAMP with a control set of 100 phylo- 
genetically unrelated adjacent gene pairs with the same orienta- 
tion and similar intergenic distance (see Methods). CAMPs showed 
significantly higher coexpression levels (average Spearman corre- 
lation coefficient; r = 0.30) than both the control sets (r = 0.21, P = 
8.7 X 10"^ KS test) and random gene pairs (r = 0.14, P = 9.6 X 
10"^^). For each CAMP, we then ranked its coexpression value 
relative to its control set and also relative to 100 random gene pairs. 
The cumulative plot shows an excess of high ranks with respect to 
both the control and the random set (Fig. 2C, black and gray areas 
above the red line, respectively). This excess is mostly due to an 
enrichment in the most highly coexpressed decile (ranks 91-100, 
red arrowhead in Fig. 2D). This pattern was observed for all ori- 
entations, but was strongest for 5 '-5 'gene pairs (data not shown). 
Finally, to assess whether this may be a general pattern across 
metazoans, we performed a similar analysis for D. melanogaster, 
using 1909 published microarray experiments. These showed even 
stronger coexpression, with 81% of the pairs ranking in the top 
half of their 100 matched control pairs (50% expected) and 25% in 
the most highly coexpressed decile (10% expected; Supplemental 
Fig. S2). This excess of coexpression of CAMPs relative to other 
neighboring pairs is consistent with selection to maintain some 
CAMPs because of shared transcriptional regulatory elements. 

We next investigated whether intergenic regions separating 
CAMPs show few insulators (prediction 5), which may impose 
a barrier for coordinated expression. Using insulators defined by 
chromatin signatures in nine human cell lines (Ernst et al. 2011), 
we found that CAMPs had significantly fewer intergenic insulators 
than their matched controls (0.062 vs. 0.076 insulators/Kbp, P = 
6.4 X 10~^, KS test). This was more evident for 5 '-5' orientations 
(0.054 vs. 0.070 insulators/Kbp, P = 1.4 X 10"^) and for relatively 
short intergenic regions (<50 Kbp, 0.047 vs. 0.073 insulators/Kbp, 
P = 2.9 X lO"'^). In contrast, CAMPs showed a higher density of 
transcriptional enhancers (Ernst et al. 2011) in intergenic regions 
(0. 180 vs. 0. 155 enhancers/Kbp, P = 0.022), in particular for highly 
coexpressed genes (see below). Finally, intergenic sequence con- 
servation was also significantly higher in CAMPs than in non- 
conserved pairs (average mammalian phastCons score of 0.18 vs. 
0.14, P = 2.9 X 10"^), fulfilling prediction 6. 

In summary, coordination of gene expression is likely to ex- 
plain the preservation of many CAMPs. One example involves the 
mitochondrial chaperonin genes Hspel and Hspdl, which are 
found together in nearly all studied species, from humans to the 
unicellular holozoan Capsaspora (and also in fungi) (Davila-Lopez 
et al. 2010), in a conserved 5 '-5' orientation. In humans, they have 
a very short intergenic distance (<1 Kbp) and are known to share 
a bidirectional promoter (Hansen et al. 2003). In agreement with 
this, both genes show high coexpression levels in our microarray 
analysis (r = 0.81, rank = 99). Some other strongly coexpressed 
pairs in humans include: UBA3-ARL6IP5 (r = 0.84, rank = 100), 
HNRNPA2B1-CBX3 (r = 0.84, rank = 100), HADHA-HADHB (r = 
0.79, rank = 100), and ZMYM3-N0N0 (r = 0.83, rank = 100). 

Genomic regulatory blocks [GRBs] 

Another major cause of conserved gene linkage may be preserva- 
tion of GRBs. The typical ORB comprises a trans-dev gene and one 
or more functionally unrelated bystander genes, whose introns 
harbor c/s-regulatory sequences that act on the trans-dev gene 
promoter (Fig. IB). Thus, separation of the genes in a ORB is likely 
to result in misregulation of the trans-dev gene. Several features of 
known GRBs are not expected from gene pairs with coordinated 



expression, yielding four specific predictions: (1) presence of a trans- 
dev gene and one or more non-trans-dev bystander genes, (2) ex- 
tensive intronic sequence in the bystander gene, (3) evolutionary 
conservation of intronic sequence in the bystander gene, specifically 
in relatively short highly conserved noncoding regions (HCNRs), 
and (4) transcriptional enhancers (acting on the trans-dev gene) in 
bystander introns. 

In order to test these predictions, we first identified trans-dev 
genes using Gene Ontology (see Methods). A total of 29.4% of 
conserved CAMPs in human were composed of a trans-dev and 
a non-trans-dev gene, compared with 19.5% in the control set (P = 
6.0 X 10~^, test). Similar patterns were found in all species, with 
even higher proportions of trans-dev-plus-nondevelopmentai 
CAMPs in fast-evolving species such as tunicates and flies (54.6% 
and 42.9%, respectively). This is consistent with the lower fraction 
of 5 '-5' (putatively coregulated) CAMPs in these lineages. 

We next studied bystander gene's introns (prediction 2). We 
found higher intron number (P = 7 X 10~^, KS test) and average 
intron length (twice as long; P = 1.7 X 10~^) in bystander genes 
relative to controls (Fig. 3A,B). Greater intron length mostly 
reflected a subset of very long introns (a feature generally associ- 
ated with the presence of HCNRs) (Irimia et al. 2011): Bystander 
genes have an average of 2.7 introns longer than 10 Kbp, compared 
with 0.9-1.1 in the other gene sets (P = 2.3 X 10"^) (Fig. 3C). 

We also found higher conservation of intron sequences in 
trans-dev and bystander genes than in the respective control sets 
(prediction 3; P = 0.0017 and P = 3.2 X 10"^, respectively, using 
mammalian phastCons scores) (Siepel et al. 2005). In contrast, 
CAMPs with no trans-dev gene showed no greater intronic con- 
servation than their matched controls (Fig. 3D). Because enhancers 
likely constitute only a small fraction of all intronic sequence, we 
expect an even stronger enrichment of HCNRs than overall se- 
quence conservation. Consistent with this, putative bystander 
genes in CAMPs had six to 10-fold more HCNRs than in non- 
conserved pairs, both for ancient conserved noncoding elements 
(aCNEsO) (Lee et al. 2011) (6.2 vs. 1.0 aCNEs/Mbp, P = 1.4 X 10"^) 
(Fig. 3E), and for VISTA HCNRs (Visel et al. 2007) (1.6 vs. 0.2 
VISTAe/Mbp, P = 0.036). Finally, following prediction 4, CAMP's 
bystander introns had a higher density of functionally defined 
transcriptional enhancers (Ernst et al. 2011) than the control set 
(0.194 vs. 0.180 enhancers/Kbp, P = 1.5 X 10"^) (Fig. 3F). These 
four lines of evidence thus suggest that a subset of CAMPs are 
likely GRBs. 

Experimental evidence for GRBs in vertebrate development 

The ancient GRBs we identified typically involve genes from well 
known developmental gene families, such as Fox, Fgf, Tbx, Sox, 
Smad, etc. (Supplemental Table S4). Therefore, despite the signif- 
icantly higher density of enhancers active in cell lines (Ernst et al. 
2011), the major effect of bystander-contained enhancers is ex- 
pected during embryonic development. In order to test this hy- 
pothesis, we used chromatin immunoprecipitation followed by 
high-throughput sequencing (ChlP-seq) for three key epigenetic 
marks in zebrafish embryos at 24-h post-fertilization (hpf): Histone 
3 Lysine 4 trimethylation (H3K4me3, marks active promoters), 
Histone 3 Lysine 4 monomethylation (H3K4mel, often marks 
active enhancers when not overlapping with H3K4me3), and 
Histone 3 Lysine 27 trimethylation (H3K27me3, often marks in- 
active promoters, and indicates genes with tissue-specific expres- 
sion patterns in whole embryos) (Turner 2007; Akkers et al. 2009; 
Margueron and Reinberg 2010). We searched for conservation of 
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Figure 3. Different functional and evolutionary features of intragenic sequences of conserved pairs. (A) Average intron number for each gene type. 
(Dark gray) Conserved pairs; (light gray) control set. Putative human bystander genes (nondevelopmental genes associated with trans-dev genes in 
conserved pairs) have significantly more introns on average than the control set (P= 7 X 1 0~^, KS test), whereas associated trans-dev genes have fewer (P= 
6 X 1 0""^, KS test). Genes in highly coexpressed or lowly coexpressed conserved pairs have similar average intron numbers than the control set. (5) Average 
intron length (in Kbp). Putative bystander genes have significantly longer introns than the other genes (P = 2 X 10~^ KS test). (C) Average number of 
introns longer than 1 0 Kbp. Bystander genes have nearly three times more long introns than other genes (P = 2 X 1 0~^). (D) Average mammalian-wide 
phastCons score of intronic regions of different types of conserved gene pairs. Both trans-dev and bystander genes from conserved pairs show higher 
intronic sequence conservation than the nonconserved genes (P = 0.002 and P = 3 x 10 ^ KS test), whereas conserved highly coexpressed non- 
developmental gene pairs show higher conservation than the control set at the intergenic region. (£) Density of ancient conserved noncoding elements 
(aCNEs) per Mbp in introns of different types of conserved gene pairs. (F) Density of functionally defined strong enhancers per Kbp in introns of different 
types of conserved gene pairs. Error bars correspond to standard errors. 



the 593 bilaterian CAMPs in the zebrafish genome, finding 260 
conserved pairs of 205 unique groups, of which 29.2% included 
one trans-dev gene. Mapping of H3K4mel-i-/H3K4me3- peaks shows 
that putative bystanders for CAMPs contain approximately four 
times more active enhancers than for nonconserved pairs (P = 
0.005, KS test) (Fig. 4A), and with a higher density (P = 0.003) 
(Fig. 4B). These results suggest that conserved GRBs have complex 
ds-regulatory landscapes in zebrafish development. Importantly, 
we found that H3K27me3 is significantly increased in trans-dev, 
but not in bystander genes in whole zebrafish embryos (Fig. 4C-E). 
This suggests that, globally, the trans-dev genes of GRBs, but not the 
enhancer-containing bystanders, have complex tissue-specific 
expression patterns (Akkers et al. 2009). 

A striking case of conserved ORB involves the ISL LIM ho- 
meobox (7s/). Isl plays important conserved roles in animal de- 
velopment, in particular in neuron ontogeny in diverse phyla 
(Thor and Thomas 1997; Jackman et al. 2000; Voutev et al. 2009; 
Liang et al. 201 1). We found that Isl genes are linked to Scaper/ssp3 
(S-phase cyclin A-associated protein in the ER) genes in nearly all 
studied species, from sponges to humans, for over 1000 MY of 
evolution (Fig. 5A). In human, SCAPER spans —500 Kbp and con- 
tains 16 introns longer than 10 Kbp, as expected for a bystander. 
Furthermore, the expression patterns of these two genes are very 
different in flies (r = 0.08), humans (r = -0.07), and zebrafish (Fig. 
5B). In 24-hpf zebrafish embryos, isl2a is expressed in specific ce- 
phalic neuronal domains and in discrete neurons in the spinal cord 



(arrow), and in hindgut (red arrow) — a pattern conserved in other 
animal lineages (Jackman et al. 2000; Gelbart and Emmert 2010; 
Graveley et al. 201 1) — whereas scaper is not expressed at this stage. 

In zebrafish, ChlP-seq data for H3K4mel showed three po- 
tential enhancers within scaper at 24 hpf (red bars in Fig. 5C), 
whereas H3K4me3 indicated that isl2a's promoter is active but 
scaper's is not (Fig. 5C), suggesting that the active enhancers within 
scaper may be acting on isl2a. We thus used the GFP reporter vector 
ZED (Bessa et al. 2009) to examine the enhancer activity of the 
three H3K4mel+/H3K4me3- peaks in stable zebrafish transgenic 
lines. One of them (red asterisk in Fig. 5C) promoted GFP expres- 
sion in different neuron populations and hindgut, two domains 
coexpressing the endogenous Isl2a protein, as shown by ISH and 
immunocolocalization (Fig. 5; Supplemental Fig. S3). These results 
strongly suggest that at least this scaper-contdined enhancer is 
acting specifically on isl2a, and provide a plausible explanation for 
the conservation of this gene association. 

Another interesting case is the homeobox transcription factor 
Otx (Fig. 6). This gene is involved in establishing a deeply con- 
served early anterior-posterior (A-P) brain patterning (Simeone 
et al. 1992; Hirth et al. 2003; Castro et al. 2006; Irimia et al. 2010) 
and is expressed in the anterior nervous system of nearly all studied 
bilaterians (Simeone et al. 1992; Williams and Holland 1996; Hirth 
et al. 2003; Lowe et al. 2003; Scholpp et al. 2007). We found that 
Otx is linked to Ehbpl (EH domain binding protein 1) in nearly all 
species, from placozoans to humans, spanning over 700 MY (Fig. 6A). 
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the identified enhancer, and revealed 
another potential c/s-regulatory element 
within the ehbpl intron that clearly con- 
tacted the otxlb promoter (Fig. 6C). Col- 
lectively, these results show that ehbpl 
contains c/s-regulatory elements that spe- 
cifically regulate the expression of otxlb 
in deeply conserved domains. 

Discussion 



trans-dev bystander 



others 



trans-dev bystander others 



— trans-dev genes 

— bystander genes 




-30.000 



30,000 



152 

H3K27me3 



31 _ 

H3K27me3 



foxdb 



gmds 



20 kbi 

15.760,0001 15.770,0001 



fgf20a 



efha2 



Figure 4. Differential epigenetic marks in in bystander and trans-dev genes in early zebrafish de- 
velopment. (A,B) Average number (A) and density (peaks/Kbp, B) of H3K4mel +/H3K4me3- peaks 
(putative active enhancers) in 24-hpf zebrafish embryos within trans-dev, bystander, and other non- 
developmental genes for conserved pairs (dark gray) and for nonconserved pairs (light gray). Error bars 
correspond to standard errors. (C) Average number of reads for H3K27me3 ChlP-seq around tran- 
scription start sites show an specific deposition of this epigenetic mark in trans-dev genes. (D,f) Two 
examples of differential distribution of H3K27me3 in a trans-dev gene {foxcl b, D and fgf20a, E) and its 
nondevelopmental partner {gmds, D and efha2, f ). 



As in the case of Isl-Scaper, the expression of OTXl and EHBPl is 
very different in human (r = -0.12) and in zebrafish (Fig. 6B). 
During early zebrafish development, only otxlb is expressed in the 
embryo, mainly in the cephalic region. In addition, zebrafish ehbpl 
also has several long introns that contain potential enhancers, as 
indicated by high levels of H3K4mel without H3K4me3 (Fig. 6C). 
We therefore generated stable transgenic zebrafish reporter lines 
for three of these H3K4mel peaks, one of which drove strong and 
consistent expression to the classical conserved Otx anterior do- 
main in the central nervous system. Next, to test whether the 
remaining ehbpl intronic sequences may also contain other otxlb 
d5-regulatory elements, we performed chromosome conformation 
capture (3C) assays (Hagege et al. 2007) to explore physical interaction 
between several regions of the ehbpl gene (red arrows in Fig. 6C) 
and the otxlb promoter. This assay confirmed the interaction of 



We have reported a large number of genes 
belonging to conserved ancestral micro- 
syntenic pairs (CAMPs) shared across sev- 
eral deeply diverged metazoan lineages. 
In contrast to the few previously described 
cases of deeply conserved microsynteny, 
nearly all of which involve tandemly du- 
plicated gene pairs or clusters, we found 
nearly 600 pairs of unrelated genes that 
are closely physically linked in several 
major bilaterian lineages spanning over 
600 MY of animal evolution (half of them 
already present in the nonbilaterian an- 
cestors, 700-1000 MY ago). Moreover, 
these numbers are likely underestimates, 
since most genome sequence assemblies 
used in our study are highly fragmented 
into relatively small scaffolds (Supplemen- 
tal Table SI). Thus, much of the micro- 
synteny in most organisms could not be 
truly evaluated and was conservatively 
considered to be nonconserved. 

These results are quite unexpected in 
light of previous studies of pairwise con- 
servation. Current and previous compar- 
isons between pairs of distantly related 
species show very little conservation of 
local microsynteny (e.g., only ~l%-5% 
between humans and other lineages; Sup- 
plemental Fig. S4). Indeed, even studies of 
slow-evolving metazoans, which found 
conservation of chromosome-level link- 
age (e.g., genes on the same chromosome 
in humans also tend to be on the same 
chromosome in cnidarians) found very 
little or no microsyntenic conservation 
across several species (Putnam et al. 2007, 2008; Srivastava et al. 
2008, 2010). Our finding that comparable fractions of gene link- 
ages (e.g., including l%-2% of genes in humans; Supplemental 
Table SI) have been independently conserved in several different 
lineages is therefore quite unexpected, and implies that the same 
specific subset of gene linkages have been actively retained in 
widely diverged lineages (including examples even in typically 
fast-evolving species) (Chavali et al. 2011; Lv et al. 2011; Weber 
and Hurst 2011). Accordingly, we have provided extensive evi- 
dence that many of these associations have been conserved due to 
functional reasons. First, many CAMPs show high coexpression 
levels in humans and/or flies, suggesting that they may share cis- 
regulatory inputs resulting in coordinated transcription. Second, 
we found more than 100 putative ancient GRBs, often involving 
important developmental regulators, substantially adding to the 
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Figure 5. Functional characterization of the GRB of isl2a-scaper in zebrafish. (A) Phylogenetic distribution of the GRB across the studied metazoan 
species. The GRB was only not conserved in C elegans and N. vectensis. (B) Top and middle panels are 24-h post fertilization (hpf) zebrafish embryos 
showing the expression patterns of isl2a and scaper. isl2a is expressed in discrete neurons in the spinal cord (arrow) and in the proctodeum (arrowhead), 
while scaper is not detectable at this stage. The third panel is an embryo at a similar developmental stage showing GFP expression promoted by an 
enhancer located within the intron of scoper (asterisk in Q. This enhancer is active in /s/2o-expressing territories. The bottom panel shows immunodetection 
of the transgenic GFP (green), the endogenous Islet proteins (red) and the overlay between the two showing coexpression. (C) Distribution of H3K4me3, 
H3K4me1, and H3K27me3 tracks along the /s/2o-scoper GRB in 24-h pf zebrafish embryos. H3K4me1 peaks tested for enhancer activity in zebrafish stable 
transgenic lines are shaded in red. H3K4me3 and H3K27me3 distribution show that scaper is inactive and this stage and that isl2a is tissue specific, 
suggesting that the H3K4mel -positive enhancer may be acting on the active isl2a promoter. (Below) Conservation track from the UCSC Genome Browser. 



three previously known trans-phyletic GRB (Wang et al. 2007; 
Irimia et al. 2012; Maeso et al. 2012a). The deep conservation of 
GRBs suggests that they may underlie the remarkable conservation 
of some developmental genetic programs, such as the A-P pat- 
terning of the bilaterian central nervous system (in the case of Otx- 
Ehbpl) or neuronal ontogeny (Isl-Scaper). At the same time, the 
conservation of some GRBs in lineages with very different body 
plans or cell types is intriguing. For example, in the case of Isl, it 
is not clear which common regulatory role may be responsible 
for the conservation of the association between bilaterians and 
sponges, which have no proper neurons (Hooper and Van Soest 
2002; Sakarya et al. 2007). 



The phylogenetic distribution of CAMPs also shows that, 
despite their deep conservation, many of the associations have 
been repeatedly lost in different lineages. This was observed even 
for extremely conserved CAMPs such as the coexpressed mito- 
chondrial chaperonins HSPEl-HSPDl (lost in flies) and the GRBs 
described in detail in the present study (Figs. 5 A, 6A). Different 
causes may underlie the loss of these associations. For example, 
major modification of body plans may render some of the regu- 
latory constraints obsolete, and thus the microsynteny can be lost, 
presumably without selective cost (as in the case of Hox clusters) 
(Duboule 2007). Alternatively, associations within GRBs could be 
disentangled through the acquisition of genetic redundancy 
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Figure 6. Functional characterization of the GRB of otxib-ehbpl in zebrafish. (A) Phylogenetic distribution of the GRB across the studied metazoan species. The 
GRB was not conserved in D. melanogaster, C elegans, and N. vectensis. (B) Top and middle panels are zebrafish embryos at 24 hpf showing the expression of otxib 
and ehbpl genes, otxib is detected in the anterior brain, while ehbpl is expressed in the yolk. (Lower par\e\) GFP expression promoted by an enhancer located 
within the intron of ehbpl (asterisk in Q. This enhancer is active in most tissues expressing otxib. (C) Distribution of H3K4me3, H3K4me1, and H3K27me3 tracks 
along the otxlb-ehbpl GRB in 24-hpf zebrafish embryos. H3K4me1 peaks tested for enhancer activity in zebrafish stable transgenic lines are shaded in red. The 
regions that physically interact in 3C assays with the otxib promoter are shaded in blue. H3K27me3 distribution indicates that otxib, but not ehbpl, has tissue- 
specific expression; hence, H3K4me1 enhancers located in ehbpl introns are likely acting on otxib. (Below) conservation track from the UCSC Genome Browser. 
(D) Graph showing a 3C experiment to detect interaction between different ehbpl intronic regions and the otxib promoter in 24-hpf embryos. A fixed primer 
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(McEwen et al. 2006; Kikuta et al. 2007; Navratilova et al. 2010; 
Goode et al. 2011; Maeso et al. 2012a). Complex genes are often 
regulated by a redundant set of c/s-regulatory elements, which 
increase robustness, and are continuously evolving (Jeong et al. 
2006; Hong et al. 2008; Frankel et al. 2010; Perry et al. 2010; Schmidt 
et al. 2010). For example, if a new, redundant c/s-regulatory module 
arises outside of the bystander gene (e.g., in the intergenic region), 
the bystander gene could be now translocated without affecting 
the trans-dev gene's regulation. In a more complex scenario, du- 
plication of the gene pair (either by whole-genome or segmental 
duplication) can aid this process. In coregulated gene pairs, each 
gene of the pair could be reciprocally lost from one of the dupli- 
cated regions while keeping the common ds-regulatory elements 
in both, therefore becoming two nonassociated genes with fully 
functional regulatory landscapes. In the case of GRBs, the coding 
sequences of extra bystander gene copies may be erased, while the 
trans-<iev-associated regulatory elements are conserved (McEwen 
et al. 2006; Kikuta et al. 2007; Maeso et al. 2012a). 

Finally, the extent of ancient conserved microsynteny we 
have uncovered is even more striking when considering that the 
ancestral bilaterian likely had fewer than 10,000 unique genes 
(Miller and Ball 2009): Thus, over 12% of these are involved in 
close microsyntenic relationships conserved in multiple lineages 
to this day. It can thus be argued that microsynteny is among the 
most conserved features of metazoan genomes, and that ancient 
ds-regulatory inputs may be far more common than currently 
appreciated (Royo et al. 2011). We expect that as metazoan ge- 
nomes continue to be sequenced at an ever-faster rate, many more 
microsyntenic relationships will be discovered, and many more 
details surrounding their role as key components of the genome's 
architecture will be revealed. 



Methods 

Genome-wide search for deeply conserved gene pairwise 
associations 

We downloaded full genome annotations for the initial 13 species 
(Supplemental File SI). For multiple-transcript genes we used only 
the longest protein isoform. Homologs were identified by pairwise 
BLASTP [£-value <10~^ without filtering for low complexity se- 
quences (-F F)]. 

For each pair of immediately adjacent protein-coding genes 
(excluding noncoding RNA genes, pseudogenes, etc.) in each ge- 
nome, we asked whether the first or second best BLAST hits for 
both proteins were also neighboring in each of the 12 other ge- 
nomes. This was defined as on the same chromosome/scaffold 
with four or fewer intervening genes, a threshold chosen to deal 
with: (1) common annotation errors (e.g., spurious automatic gene 
models, split genes), and (2) the fact that GRBs often include 
nonadjoining genes (Kikuta et al. 2007). Cutoffs of three to 10 
intervening genes yielded similar results; Supplemental Fig. S5). 
Based on randomization simulations (see below), we considered 
a gene pair conserved if it was linked in four total lineages (except 
vertebrates, each species was considered a distinct lineage, since 
pairwise species comparisons showed no enrichment for pair con- 
servation due to phylogenetic proximity, except within vertebrates; 
Supplemental Fig. S4). To determine "conservation" by chance, we 
randomized gene order within each chromosome/scaffold for each 
of the 12 species, with 100 replicates. These simulations indicated 
that a cutoff of four or more lineages with four or fewer intervening 
genes yields a false-positive discovery rate of <0.0002 per gene pair 
for all species (Supplemental Table SI). 



Generation of a unique data set of ancient microsyntenic 
gene pairs 

Conserved pairs for each of the 13 species were then merged into 
a single data set of nonredundant groups. Each unique group 
contained the syntenic pairs for all species in which it was con- 
served, including duplicates of the pairs within species (e.g., 
paralogons in vertebrates resulted from the two rounds of whole- 
genome duplication), if any. Next, we assessed whether the genes 
were related (paralogous pairs), and filters were applied to account 
for reciprocal blast consistency and common annotation errors 
(see Supplemental File SI for details). 

To reconstruct the history of linked duplicate genes, we per- 
formed Bayesian phylogenetic inferences for genes for all species 
for each pair, to distinguish (1) independent tandem duplications 
(clustering of genes by species on phylogenetic trees) and (2) 
retained ancestral linkage. Statistical significance was tested by 
comparison to randomized trees with the same topology (see 
Supplemental File SI for details). 

We also studied disruption of CAMPs. First, we assessed 
conservation in four additional species: S. kowalevskii, O. dioica, 
A. queenslandica, and C. owczarzaki using tBLASTN against the as- 
sembled contigs (Supplemental File SI). Then, we applied parsi- 
mony to infer the number of CAMPs that were present at each 
node of a consensus phylogenetic tree, and estimated the fraction 
that were disrupted along each branch. 

Coexpression of CAMPs using microarray data 

We downloaded the full set of experiments from Af fymetrix Human 
Genome U133 Plus 2.0 Array and Affymetrix Drosophila Genome 
2.0 Array (GEO accession nos. GPL570 and GPL1322). After ex- 
cluding experiments with missing probes, we obtained a matrix 
with 54,608 probes with data for 23,941 experiments in humans, 
and 13,935 probes with data for 1909 experiments in Drosophila. 
The expression levels were converted to ranks within each exper- 
iment to correct for different measurement and normalization 
methods. 

Of the CAMPs with no intervening genes, 279 in humans and 
27 in Drosophila had at least one probe for each gene. For these, we 
calculated the correlation coefficient for gene expression between 
the two genes. For comparisons, we generated two control sets for 
each conserved pair: (1) 100 nonparalogous, nonconserved gene 
pairs with the same orientation and the most similar intergenic 
distances; (2) 100 pairs of two randomly selected nonsyntenic 
genes. Then, the correlation coefficient of each conserved pair was 
ranked with respect to each of its two control sets. When multiple 
probes covered a gene, we used the combination of probes that 
gave the highest correlation (for both test and control sets). Using 
the average between probes gave the same pattern of higher 
coexpression between conserved pairs. 

Study of genome structure 

Orientation and intergenic distance were calculated from GFF/GTF 
annotation files, using the stable transcript for Ensembl genomes, 
and the best gene models for other genomes. We merged all gene 
isoforms into a single intron-exon structure to determine intron 
number/lengths. Global sequence conservation was calculated 
using the phastCons46way Placental scores from UCSC Comparative 
track (http://genome.ucsc.edu/). Only sites with scores >0 (confi- 
dently aligned across genomes) were used for calculations. HCNRs 
were obtained from two sources: (1) vistaEnhancers track at UCSC, 
and (2) ancient conserved noncoding elements (aCNEs) (Lee et al. 
2011). Active strong enhancers and insulators for nine human 
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cell lines were obtained from Ernst et al. (2011). Data for all cell 
lines were merged into a unique, nonredundant set of coordinates. 

To study GRBs, we defined developmental (trans-dev) genes as 
genes with GO terms embryo development (00:0009790) and/or 
organ development (00:0048513). For comparison, we also defined 
two sets of CAMPs composed of two nondevelopmental genes, based 
on the analyses of coexpression presented above: pairs of highly 
coexpressed nondevelopmental genes (r > 0.40, likely enriched in 
gene pairs conserved due to coregulatory reasons) and of weakly 
coexpressed nondevelopmental genes (r < 0.05). In addition, each of 
these conserved pair sets had its corresponding control set, consisting 
of similar nonparalogous syntenic pairs (i.e., bystander plus trans-dev 
or highly/lowly coexpressed) that are not evolutionarily conserved. 

Experimental analyses in zebrafish embryos 

Chromatin immunoprecipitation (ChIP) was performed following 
the protocol described in Wardle et al. (2006) with minor modifi- 
cations, and Oenome Analyzer (Illumina) ChlP-seq was performed 
as described in Bogdanovic et al. (2012) (see Supplemental File SI 
for details). Highly enriched regions (peaks) of histone methyla- 
tion were obtained by the MACS (v. 1.3.3) algorithm (Zhang et al. 
2008) using standard settings with one modification (mfold = 20). 
Twenty-five randomly selected peaks were verified by qPCR and 
compared with their random controls (false-positive discovery rate 
[FDR] <0.04). The PCRs were performed on 1:50 dilutions of the 
Chip samples using the 01 000 Thermal Cycler (BioRad). 

For each H3K4mel -positive peak that was tested in stable 
zebrafish transgenic assays, we designed primers to span the whole 
region plus —100 nt at each side (primer sequences are available in 
Supplemental Table S5). Specific details for cloning, preparation, 
and injection of candidate enhancer sequences into zebrafish eggs, 
as well as for in situ hybridization and immunostaining, are pro- 
vided in Supplemental File SI; in all cases, previously described 
protocols were followed with minor modifications (Kawakami 
2004; Tena et al. 2007; Bessa et al. 2009). 30 assays were performed 
as referred in Hagege et al. (2007) and Tena et al. (2011), (see Sup- 
plemental File SI). A set of locus-specific primers (Supplemental 
Table S5) was designed to perform qPORs to measure relative en- 
richment in each ligation product. Negative control primers were 
designed —30 Kbp upstream of and downstream from the regions 
of interest. POR values were normalized with primers for Ercc3. 
ChlP-seq data for H3K4mel and H3K4me3 were downloaded from 
the NCBI Oene Expression Omnibus (OEO) (OSE32483, http:// 
www.ncbi.nlm.nih.gov/ geo/ query/ acc.cgi?token=vnyhbkqayaioczc& 
acc=OSE32483). 

Data access 

H3K27me3 short read data have been submitted to the NCBI Oene 
Expression Omnibus (OEO) (http://www.ncbi.nlm.nih.gov/geo/) 
under accession no. OSE35050. 

Acknowledgments 

M.I., M.S.A., S.W.R., and H.B.R were funded by NIH grant 
1R21HO005240-01A1. H.B.R is an Alfred R Sloan Fellow and Pew 
Scholar in the Biomedical Sciences. J.J.T., A.F-M., O.B., E.C-M., and 
J.L.O-S. were funded by grants BFU2010-14839, CSD2007-00008, 
and Proyecto de Excelencia CVI-3488. 

References 

Adachi N, Lieber MR. 2002. Bidirectional gene organization: A common 
architectural feature of the human genome. Cell 109: 807-809. 



Akkers RC, van Heeringen SJ, Jacobi UG, Janssen-Megens EM, Franfoijs KJ, 

Stunnenberg HG, Veenstra GJ. 2009. A hierarchy of H3K4me3 and 

H3K27me3 acquisition in spatial gene regulation in Xenopus embryos. 

Dev Cell 17: 425-434. 
Baldwin W, Marko P, Nelson D. 2009. The cytochrome P450 (GYP) gene 

superfamily in Daphnia pulex. BMC Genomics 10: 169. doi: 10.1186/ 

1471-2164-10-169. 
Becker TS, Lenhard B. 2007. The random versus fragile breakage models of 

chromosome evolution: A matter of resolution. Mol Genet Genomics 278: 

487-491. 

Bessa J; Tena JJ, de la Galle-Mustienes E, Fernandez-Minan A, Naranjo S, 
Fernandez A, Montoliu L, Akalin A, Lenhard B, Gasares F, et al. 2009. 
Zebrafish enhancer detection (ZED) vector: A new tool to facilitate 
transgenesis and the functional analysis of ds-regulatory regions in 
zebrafish. DevDyn 238: 2409-2417. 

Bhutkar A, Schaeffer SW, Russo SM, Xu M, Smith TF, Gelbart WM. 2008. 
Chromosomal rearrangement inferred from comparisons of 12 
Drosophila genomes. Genetics 179: 1657-1680. 

Bogdanovic O, Fernandez-Minan A, Tena JJ, de la Galle-Mustienes E, Hidalgo 
G, van Kruysbergen I, van Heeringen SJ, Veenstra GJ, Gomez-Skarmeta 
JL. 2012. Dynamics of enhancer chromatin signatures mark the 
transition from pluripotency to cell specification during embryogenesis. 
Genome Res 22: 2043-2053. 

Bourque G, Zdobnov EM, Bork P, Pevzner PA, Tesler G. 2005. Comparative 
architectures of mammalian and chicken genomes reveal highly 
variable rates of genomic rearrangements across different lineages. 
Genome Res 15: 98-110. 

Castro LFG, Rasmussen SLK, Holland PWH, Holland ND, Holland LZ. 2006. 
A Gbx homeobox gene in amphioxus: Insights into ancestry of the 
ANTP class and evolution of the midbrain/hindbrain boundary. Dev Biol 
295: 40-51. 

Ghavali S, Morais DA, Gough J, Babu MM. 2011. Evolution of eukaryotic 
genome architecture: Insights from the study of a rapidly evolving 
metazoan, Oikopleura dioica: Non-adaptive forces such as elevated 
mutation rates may influence the evolution of genome architecture. 
Bioe55flys33: 592-601. 

Davila-Lopez M, Martinez-Guerra JJ, Samuelsson T. 2010. Analysis of gene 
order conservation in eukaryotes identifies transcriptionally and 
functionally linked genes. PLoS ONE 5: el0654. doi: 10.1371/ 
journal.pone.0010654. 

Denoeud F, Henriet S, Mungpakdee S, Aury JM, Da Silva G, Brinkmann H, 
Mikhaleva J, Olsen EC, Jubin C, Canestro C, et al. 2010. Plasticity of 
animal genome architecture unmasked by rapid evolution of a pelagic 
tunicate. Science 330: 1381-1385. 

Doolittle WE 1978. Genes in pieces: Were they ever together? Nature 272: 
581-582. 

Duboule D. 2007. The rise and fall of Hox gene clusters. Development 134: 
2549-2560. 

Engstrom PG, Ho Sui SJ, Drivenes O, Becker TS, Lenhard B. 2007. Genomic 
regulatory blocks underlie extensive microsynteny conservation in 
insects. Genome Res 17: 1898-1908. 

Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward ED, Epstein CB, 
Zhang X, Wang L, Issner R, Coyne M, et al. 201 1. Mapping and analysis 
of chromatin state dynamics in nine human cell types. Nature 473: 
43-49. 

Fischer G, Rocha EP, Brunet F, Vergassola M, Dujon B. 2006. Highly 
variable rates of genome rearrangements between hemiascomycetous 
yeast lineages. PLoS Genet 2: e32. doi: 10.1371/journal.pgen. 
0020032. 

Frankel N, Davis GK, Vargas D, Wang S, Payre F, Stem DL. 2010. Phenotypic 
robustness conferred by apparently redundant transcriptional 
enhancers. Nature 466: 490-493. 

Fukuoka Y, Inaoka H, Kohane IS. 2004. Inter-species differences of co- 
expression of neighboring genes in eukaryotic genomes. BMC Genomics 
5: 4. doi: 10.11861471-2164-5-4. 

Gelbart WM, Emmert DB. 2010. 2010 FlyBase Reference Report: Gelbart and 
Emmert, 2010.10.13, FlyBase high throughput expression pattern data beta 
version. http://flybase.org/reports/FBrf0212041.html. 

Goode DK, Callaway HA, Cerda GA, Lewis KE, Elgar G. 2011. Minor change, 
major difference: Divergent functions of highly conserved c/5-regulatory 
elements subsequent to whole genome duplication events. Development 
138: 879-884. 

Graveley BR, Brooks AN, Gadson JW, Duff MO, Landolin JM, Yang L, Artieri 
GG, van Baren MJ, Boley N, Booth BW, et al. 2011. The developmental 
transcriptome of Drosophila melanogaster. Nature 471: 473-479. 

Hagege H, Klous P, Braem C, Splinter E, Dekker J, Cathala G, de Laat W, Forne 
T. 2007. Quantitative analysis of chromosome conformation capture 
assays (3C-qPCR). NatProtoc 2: 1722-1733. 

Hansen JJ, Bross P, Westergaard M, Nielsen MN, Eiberg H, Borglum AD, 
Mogensen J, Kristiansen K, Bolund L, Gregersen N. 2003. Genomic 
structure of the human mitochondrial chaperonin genes: HSP60 and 



Genome Research 2365 

www.genonne.org 



Irimia et al. 



HSPIO are localised head to head on chromosome 2 separated by 

a bidirectional promoter. Hum Genet 112: 71-77. 
Hirth F, Kammermeier L, Frei E, Walldorf U, Noll Reichert H. 2003. An 

urbilaterian origin of the tripartite brain: Developmental genetic 

insights from Drosophila. Development 130: 2365-2373. 
Hong JW, Hendrix DA, Levine MS. 2008. Shadow enhancers as a source of 

evolutionary novelty. Science 321: 1314. doi: 10.1126/science. 

116631. 

Hooper JNA, Van Soest RWM. 2002. Systema Porifera: A guide to the 

classification of sponges. Kluwer Academic/Plenum Publishers, New York. 

Hurst LD; Williams EJ, Pal C. 2002. Natural selection promotes the 

conservation of linkage of co-expressed genes. Trends Genet 18: 604- 
606. 

Hurst LD, Pal C, Lercher MJ. 2004. The evolutionary dynamics of eukaryotic 

gene order. Nat Rev Genet 5: 299-310. 
Irimia M, Roy SW. 2008. Spliceosomal introns as tools for genomic and 

evolutionary analysis. Nucleic Acids Res 36: 1703-1712. 
Irimia M, Maeso I, Garcia-Fernandez J. 2008. Convergent evolution of 

clustering of Iroquois homeobox genes across metazoans. Mol Biol Evol 

25: 1521-1525. 

Irimia M, Pineiro C, Maeso I, Gomez-Skarmeta JL, Casares F, Garcia- 
Fernandez J. 2010. Conserved developmental expression of Fezf in 
chordates and Drosophila and the origin of the Zona Limitans 
Intrathalamica (ZLI) brain organizer. EvoDevo 1: 7. doi: 10.1186/2041- 
9139-1-7. 

Irimia M, Maeso I, Burguera D, Hidalgo-Sanchez M, Puelles L, Garcia- 
Fernandez J, Roy SW, Ferran JL. 2011. Contrasting 5 ' and 3' evolutionary 
histories and frequent evolutionary convergence in meis/hth gene 
structures. Genome Biol Evol 3: 551-564. 

Irimia M, Royo JL, Burguera D, Maeso I, Gomez-Skarmeta JL, Garcia- 
Fernandez J. 2012. Comparative genomics of the Hedgehog loci in 
chordates and the origins of Shh regulatory novelties. Sci Rep 2: 433. doi: 
10.1038/srep00433. 

Jackman WR, Langeland JA, Kimmel CB. 2000. zs/et reveals segmentation in 
the Amphioxus hindbrain homolog. DevBiol 220: 16-26. 

Jeong Y, El-Jaick K, Roessler E, Muenke M, Epstein DJ. 2006. A functional 
screen for sonic hedgehog regulatory elements across a 1 Mb interval 
identifies long-range ventral forebrain enhancers. Development 133: 
761-772. 

Kawakami K. 2004. Transgenesis and gene trap methods in zebrafish by 
using the Tol2 transposable element. Methods Cell Biol 77: 201-222. 

Kent WJ, Zahler AM. 2000. Conservation, regulation, synteny, and introns 
in a large-scale C. briggsae-C. elegans genomic alignment. Genome Res 10: 
1115-1125. 

Kikuta H, Laplante M, Navratilova P, Komisarczuk AZ, Engstrom PG, 

Fredman D, Akalin A, Caccamo M, Sealy I, Howe K, et al. 2007. Genomic 
regulatory blocks encompass multiple neighboring genes and maintain 
conserved synteny in vertebrates. Genome Res 17: 545-555. 

Koonin EV, Wolf YI. 2010. Constraints and plasticity in genome and 
molecular-phenome evolution. Nat Rev Genet 11: 487-498. 

Lee JM, Sonnhammer EL. 2003. Genomic gene clustering analysis of 
pathways in eukaryotes. Genome Res 13: 875-882. 

Lee AP, Kerk SY, Tan YY, Brenner S, Venkatesh B. 2011. Ancient vertebrate 
conserved noncoding elements have been evolving rapidly in teleost 
fishes. Mol Biol Evol 28: 1205-1215. 

Liang X, Song MR, Xu Z, Lanuza CM, Liu Y, Zhuang T, Chen Y, Pfaff SL, 
Evans SM, Sun Y. 2011. Isll is required for multiple aspects of motor 
neuron development. Mol Cell Neurosci 47: 215-222. 

Lowe CJ, Wu M, Salic A, Evans L, Lander E, Stange-Thomann N, Gruber CE, 
Gerhart J, Kirschner M. 2003. Anteroposterior patterning in 
hemichordates and the origins of the chordate nervous system. Cell 
113: 853-865. 

Lv J, Havlak P, Putnam NH. 2011. Constraints on genes shape long-term 
conservation of macro-synteny in metazoan genomes. BMC 
Bioinformatics 12: Sll. doi: 10.1186/1471-2105-12-S9-S11. 

Lynch M. 2007. The origins of genome architecture. Sinauer Associates, 
Sunderland, MA. 

Lynch M, Conery JS. 2000. The evolutionary fate and consequences of 
duplicate genes. Science 290: 1151-1155. 

Maeso I, Irimia M, Tena JJ, Gonzalez-Perez E, Tran D, Ravi V, Venkatesh B, 
Campuzano S, Gomez-Skarmeta JL, Garcia-Fernandez J. 2012a. An 
ancient genomic regulatory block conserved across bilaterians and its 
dismantling in tetrapods by retrogene replacement. Genome Res 22: 
642-655. 

Maeso I, Roy SW, Irimia M. 2012b. Widespread recurrent evolution of 

genomic features. Genome Biol Evol 4: 486-500. 
Margueron R, Reinberg D. 2010. Chromatin structure and the inheritance of 

epigenetic information. Nat Rev Genet 11: 285-296. 
McEwen GK, Woolfe A, Goode D, Vavouri T, Callaway H, Elgar G. 2006. 

Ancient duplicated conserved noncoding elements in vertebrates: A 

genomic and functional analysis. Genome Res 16: 451-465. 



Michalak P. 2008. Coexpression, coregulation, and cofunctionality of 

neighboring genes in eukaryotic genomes. Genomics 91: 243-248. 
Miller DJ, Ball EE. 2009. The gene complement of the ancestral 

bilaterian— was Urbilateria a monster? J Biol S: 89. doi: 10.1186/jbioll92. 
Navratilova P, Fredman D, Lenhard B, Becker TS. 2010. Regulatory 

divergence of the duplicated chromosomal loci soxlla/b by 

subpartitioning and sequence evolution of enhancers in zebrafish. Mol 

Genet Genomics 283: 171-184. 
Oliver B, Misteli T. 2005. A non-random walk through the genome. Genome 

Biol 6: 214. doi: 10.1186/gb-2005-6-4-214. 
Perry MW, Boettiger AN, BothmaJP, Levine M. 2010. Shadow enhancers foster 

robustness ot Drosophila gastrulation. Curr Biol 20: 1562-1567. 
Poyatos JF, Hurst LD. 2007. The determinants of gene order conservation 

in yeasts. Genome Biol 8: R233. doi: 10.1186/gb-2007-8-ll-r233. 
Putnam NH, Srivastava M, Hellsten U, Dirks B, Chapman J, Salamov A, Terry 

A, Shapiro H, Lindquist E, Kapitonov W, et al. 2007. Sea anemone 

genome reveals ancestral eumetazoan gene repertoire and genomic 

organization. Science 317: 86-94. 
Putnam N, Butts T, Ferrier DEK, Furlong RF, Hellsten U, Kawashima T, 

Robinson-Rechavi M, Shoguchi E, Terry A, Yu JK, et al. 2008. The 

amphioxus genome and the evolution of the chordate karyotype. Nature 

453: 1064-1071. 

Quijano C, Tomancak P, Lopez-Marti J, Suyama M, Bork P, Milan M, Torrents 
D, Manzanares M. 2008. Selective maintenance of Drosophila tandemly 
arranged duplicated genes during evolution. Genome Biol 9: Rl 76. doi: 
10.1186/gb-2008-9-12-rl76. 

Royo JL, Maeso I, Irimia M, Gao F, Peter IS, Lopes CS, D'Aniello S, Casares F, 
Davidson EH, Garcia-Fernandez J, et al. 2011. Transphyletic 
conservation of developmental regulatory state in animal evolution. 
Proc Natl Acad Sci 108: 14186-14191. 

Sakarya O, Armstrong KA, Adamska M, Adamski M, Wang IF, Tidor B, 

Degnan BM, Oakley TH, Kosik KS. 2007. A post-synaptic scaffold at the 
origin of the animal kingdom. PLoS ONE 6: e506. doi: 10.1371/ 
j oumal.pone.0000506 . 

Schmidt D, Wilson MD, Ballester B, Schwalie PC, Brown GD, Marshall A, 
Kutter C, Watt S, Martinez-Jimenez CP, Mackay S, et al. 2010. Five- 
vertebrate ChlP-seq reveals the evolutionary dynamics of transcription 
factor binding. Science 328: 1036-1040. 

Scholpp S, Foucher I, Staudt N, Peukert D, Lumsden A, Houart C. 2007. 
Otxll, Otx2 and Irxlb establish and position the ZLI in the 
diencephalon. Development 134: 3167-3176. 

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, 
Clawson H, Spieth J, Hillier LW, Richards S, et al. 2005. Evolutionarily 
conserved elements in vertebrate, insect, worm, and yeast genomes. 
Genome Res 15: 1034-1050. 

Simeone A, Acampora D, Gulisano M, Stornaiuolo A, Boncinelli E. 1992. 
Nested expression domains of four homeobox genes in developing 
rostral brain. Nature 358: 687-690. 

Srivastava M, Begovic E, Chapman J, Putnam NH, Hellsten U, Kawashima T, 
Kuo A, Mitros T, Salamov A, Carpenter ML, et al. 2008. The Trichoplax 
genome and the nature of placozoans. Nature 454: 955-960. 

Srivastava M, Simakov O, Chapman J, Fahey B, Gauthier MEA, Mitros T, 
Richards GS, Conaco C, Dacre M, Hellsten U, et al. 2010. The 
Amphimedon queenslandica genome and the evolution of animal 
complexity. Nature 466: 720-726. 

Stein LD, Bao Z, Blasiar D, Blumenthal T, Brent MR, Chen N, Chinwalla A, 
Clarke L, Glee C, Coghlan A, et al. 2003. The genome sequence of 
Caenorhabditis briggsae: A platform for comparative genomics. PLoS Biol 
1: E45. doi: 10.1371/iournaLpbio.0000045. 

Tena JJ, Neto A, de la Calle-Mustienes E, Bras-Pereira C, Casares F, Gomez- 
Skarmeta JL. 2007. Odd-skipped genes encode repressors that control 
kidney development. DevBiol 301: 518-531. 

Tena JJ, Alonso ME, de la Calle-Mustienes E, Splinter E, de Laat W, Manzanares 
M, Gomez-Skarmeta JL. 2011. An evolutionarily conserved three- 
dimensional structure in the vertebrate Irx clusters facilitates enhancer 
sharing and co-regulation. NatCommun 2: 310. doi: 10.1038/ncommsl301. 

Thomas JH. 2007. Rapid birth-death evolution specific to xenobiotic 

cytochrome p450 genes in vertebrates. PLoS Genet 3: e67. doi: 10.1371/ 
j oumal.pgen.003006 7 . 

Thor S, Thomas JB. 1997. The Drosophila islet gene governs axon 
pathfinding and neurotransmitter identity. Neuron 18: 397-409. 

Turner BM. 2007. Defining an epigenetic code. Nat Cell Biol 9: 2-6. 

Visel A, Minovitsky S, Dubchak I, Pennacchio LA. 2007. VISTA Enhancer 
Browser-a database of tissue-specific human enhancers. Nucleic Acids Res 
35: D88-D92. 

Voutev R, Keating R, Hubbard EJ, Vallier LG. 2009. Characterization of the 
Caenorhabditis elegans Islet LlM-homeodomam ortholog, lim-7. FEBSLett 
583: 456-464. 

Wang W, Zhong J, Su B, Zhou Y, Wang YQ. 2007. Comparison oiPaxl/9 
locus reveals 500-myr-old syntenic block and evolutionary conserved 
noncoding regions. Mol Biol Evol 24: 784-791. 



2366 Genome Research 

www.genome.org 



Deep conservation of microsynteny across metazoans 



Wardle FC, Odom DT, Bell GW, Yuan B, Danford TW, Wiellette EL, 
Herbolsheimer E, Sive HE, Young RA, Smith JC. 2006. Zebrafish 
promoter microarrays identify actively transcribed embryonic genes. 
Genome Biol 7: R71. doi: 10.1186/gb-2006-7-8-r71. 

Weber CC, Hurst ED. 2011. Support for multiple classes of local 

expression clusters in Drosophila melanogaster, but no evidence for 
gene order conservation. Genome Biol 12: R23. doi: 1186/gb-2011-12- 
3-r23. 

Williams Nic A, Holland PWH. 1996. Old head on young shoulders. Nature 
383: 490. doi: 10.1038/383490aO. 



Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri X Smith SF, 
North P, Callaway H, Kelly K, et al. 2005. Highly conserved non-coding 
sequences are associated with vertebrate development. PLoS Biol 3: e7. 
doi: 10.1371/journal.pbio.0030007. 

Zhang Y, Eiu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum 
C, Myers RM, Brown M, Ei W, et al. 2008. Model-based analysis of ChlP- 
Seq (MACS). Genome Biol 9: R137. doi: 10.1186/gb-2008-9-9-rl37. 



Received February 27, 2012; accepted in revised form June 13, 2012. 



Genome Research 2367 

www.genome.org 



